In this tutorial, we’ll explore how to use R Studio to make basic maps that allow us to visualize cross-national variation in transgender rights. We will be using data from the Trans Rights Indicator Project (TRIP), a new cross-national time-series dataset of transgender rights across the world that has been developed and published by the political scientist Myles Williamson. The TRIP dataset and codebook can be downloaded from the project’s website. The corresponding paper that discusses the data at greater length was published in the Political Science journal Perspectives on Politics, and is entitled “A Global Analysis of Transgender Rights: Introducing the Trans Rights Indicator Project (TRIP)”. Here is the citation information for that paper:
Williamson, Myles. 2024. “A Global Analysis of Transgender Rights: Introducing the Trans Rights Indicator Project (TRIP)”. Perspectives on Politics 22(3): 799-818. https://doi.org/10.1017/S1537592723002827
In the tutorial, we will recreate the categorical maps in Figure 3 and Figure 4 of that paper, which display information about whether countries across the world allow for legally recognized gender transitions in 2000 and 2021, respectively. We’ll then create a map that shows whether countries have passed broad-based anti-discrimination laws that protect the transgender community and their rights. After that, we’ll make another map of an overall index of transgender rights that aggregates information in the dataset into a comprehensive index of how “trans friendly” a country’s laws and policies are in the year 2021. Finally, we’ll conclude with an exercise in which you will be invited to select a transgender right of interest from the dataset, and create your own map of cross-national variation in protections for that right.
Please download the transgender rights datasets and associated codebook from the TRIP website to your computer. You should download the materials to a directory that you created specifically for this workshop. Note that the when downloaded, the file names contain spaces; file names with spaces can cause problems when reading them into R, so please modify the file name of the specific dataset we’ll work with, i.e. “Trip Scores.xlsx” so that it doesn’t have any spaces; the most straightforward option would be to change the file name to “TripScores.xlsx”.
Once you’ve downloaded the materials from TRIP into a directory
dedicated to this workshop and changed the file name of the dataset
we’ll work with to remove spaces (“Trip Scores.xlsx” to
“TripScores.xlsx”), please set this directory as your working directory
in R. Essentially, a working directory is the location on your computer
where R will look for files to read in, as well as the location where it
will export files from R. To check your current working directory, you
can use the getwd() function, which will print the file
path to your console. If you are familiar with the concept of a file
path, you can pass the path to the directory which contains the workshop
data as an argument to the setwd() function in order to set
it as your working directory, i.e. setwd("filepath").
However, if you are unfamiliar with the idea of the working directory,
it would be easier to set your working directory using the R Studio
menu. To do so, click the Session menu, scroll down to
Set Working Directory, and then click Choose
Directory. You will then be taken to a menu where you can
select the directory which you would like to designate as your working
directory.
R is an open-source programming language for statistical computing that allows users to carry out a wide range of data analysis and visualization tasks (among other things). One of the big advantages of using R is that it has a very large user community among social scientists and statisticians, who frequently publish R packages. One might think of packages as workbooks of sorts, which contain a well-integrated set of R functions, scripts, data, and documentation; these “workbooks” are designed to facilitate certain tasks or implement given procedures. These packages are then shared with the broader community, and at this point, anyone who needs to accomplish the tasks to which the package addresses itself can use the package in the context of their own projects. The ability to use published packages considerably simplifies the work of applied social scientists using R; it means that they rarely have to write code entirely from scratch, and can build on the code that others have published in the form of packages. This allows applied researchers to focus on substantive problems, without having to get too bogged down in complicated programming tasks.
In the context of this tutorial, generating maps of transgender rights based on a published tabular dataset would be quite complex if we had to write all our code from scratch. However, because we are able to make use of mapping and visualization packages written by other researchers, the task is considerably simpler, and will not require any complicated programming.
In order to process our data and make our maps, we will use a variety of packages. They are:
To install a package in R, pass the name of the package (within
quotation marks) to the install.packages() function. For
example, let’s say you don’t have tmap installed. You can
install it with the following:
# Installs tmap packages
install.packages("tmap")
A function is essentially a programming construct that takes a
specified input, runs this input (called an “argument”) through a set of
procedures, and returns an output. In the code block above, the name of
the package we wanted to install (here, “tmap”) was enclosed within
quotation marks and passed as an argument to
install.packages; this effectively downloaded the
tmap package to your computer.
Repeat that process for any packages you don’t have installed.
After all the packages are downloaded, we must load them into memory.
We can think of the process of loading installed packages into a current
R environment as analogous to opening up an application on your phone or
computer after it has been installed (even after an application has been
installed, you can’t use it until you open it!). To load (i.e. “open”)
an R package, we pass the name of the package we want to load as an
argument to the library() function. Below, we load all of
the required packages into memory:
# Loads required libraries
library(WDI)
library(sf)
library(tmap)
library(rnaturalearth)
library(rnaturalearthdata)
library(tidyverse)
library(readxl)
At this point, the packages are loaded and ready to go! One important thing to note regarding the installation and loading of packages is that we only have to install packages once; after a package is installed, there is no need to subsequently reinstall it, except in particular circumstances (for instance, if you update or reinstall R on your computer). However, we must load the packages we need (using the library function) every time we open a new R session. In other words, if we were to close R Studio at this point and open it up later, we would not need to install these packages again, but would need to load the packages again (3.5).
Note that the codeblocks in this tutorial usually have comments, prefaced by a hash (“#”). When writing code in R (or any other command-line interface) it is good practice to preface one’s code with brief comments that describe what a block of code is doing. Writing these comments can allow someone else (or your future self) to read and quickly understand the code more easily than otherwise might be the case. The hash before the comment effectively tells R that the subsequent text is a comment, and should be ignored when running a script If one does not preface the comment with a hash, R wouldn’t know to ignore the comment, and would throw an error message.
Finally, before proceeding, we will use the following code to disable spherical geometries within the sf package, which will allow us to map our data with the tmap package.
# disable spherical geometries
sf_use_s2(use_s2 = F)
## Spherical geometry (s2) switched off
Before proceeding it is useful to briefly consider the concept of object asssignment, which will make the subsequent sections easier to follow. Consider the following example:
# assign value 5 to new object named x
x<-5
In the code above, we used R’s assignment operator
(<-, i.e. a left-facing arrow) to assign the value 5 to
an object named “x.” Now that an object named “x” has been created and
assigned the value 5, printing “x” in our console (or printing “x” in
our script and running it) will return the value 5:
# Print value assigned to object "x"
x
## [1] 5
More generally, the process of assignment effectively equates the
output created by the code on the right side of the assignment operator
(<-) to an object with a name that is specified on the
left side of the assignment operator. Whenever we want to look at the
value assigned to an object (i.e. the output created by the code to the
right side of the assignment operator), we simply print the name of the
object in the R console (or print the name and run it within a
script).
While the example above was very simple, we can assign virtually any R code, and by extension, the data structure(s) generated by that code (such as datasets, maps, graphs) to an R object. Indeed, we’ll use the basic principle of object assignment introduced above to assign the datasets we’ll import below to new objects. Note that object names are arbitrary and could be virtually anything, but it is good practice for object names to describe their contents. If the concept of object assignment is new, it will begin to make more sense as we go.
Now that we’ve taken care of these preliminary steps, let’s go ahead and load our data into R Studio. Below, we’ll first load in a spatial dataset of world boundaries, and then read in our World Bank dataset using the WDI package
Before turning again to the global dataset of transgender rights from TRIP (which we introduced earlier), we will first load a spatial dataset of country boundaries into R, and learn how to work with such datasets. After that, we’ll return to the TRIP data, and learn how to join this data to the spatial dataset of country boundaries, which will us to subsequently visualize the TRIP data on a global map.
When working with spatial data in R, we will sometimes want to import
data that is stored on our computer. There are several functions in the
sf package that will allow us to easily import saved or downloaded
spatial data into R; the most commonly used sf package
function to load saved spatial vector data into R is the
st_read() function. For more details, please consult the
st_read() function’s documentation by typing
?st_read().
In our case, however, we won’t have to download and import the
spatial data we need into R Studio from our computer’s local drive. That
is because there are R packages that already provide this spatial data,
and allow us to directly load it into memory. In particular, we’ll use
the ne_countries() function of the rnaturalearth package to
bring a spatial dataset of country borders into our R environment, and
then assign it to an object that we will name
country_boundaries:
# Brings spatial dataset of country boundaries into R environment using the rnaturalearth package, and then assigns this spatial dataset to an object named "country_boundaries"
country_boundaries<-ne_countries(scale="medium", returnclass="sf")
Note the two arguments we pass to the ne_countries function: the “scale” argument specifies that we want to use a medium scale when rendering the map (the other options are ‘small’ and ‘large’), while the “returnclass” argument specifies that we want the spatial dataset as an sf object.
Now that we have our spatial dataset of country boundaries loaded
into our R environment and assigned to the new
country_boundaries object, let’s open up this dataset and
see what it looks like. The best way to view a dataset in R studio is to
pass the name of the relevant object to the View()
function, which will open up the dataset in R Studio’s built-in data
viewer.
# View "country_boundaries" data in R Studio Data Viewer
View(country_boundaries)
By scrolling across the dataset, you’ll note that each row corresponds to a country, and that there are many columns that correspond to various country-level attributes. The crucial column, however, which makes this a spatial dataset (as opposed to merely a tabular one), is the information contained in the column labeled “geometry”. This column contains geographic coordinate information that essentially defines a polygon for each country in the dataset. Note that the “geometry” column is likely one of the last columns in dataset, so you may have to scroll a bit to find it.
To observe the information in the “geometry” column more clearly, we
can extract that specific column. The dollar sign ($) is
the R operator that allows us to extract a specified column; below, we
are extracting the “geometry” column from the dataset assigned to the
country_boundaries object:
# Extracts "geometry" column from country_boundaries
country_boundaries$geometry
## Geometry set for 241 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
## Geodetic CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## First 5 geometries:
## MULTIPOLYGON (((-69.89912 12.452, -69.8957 12.4...
## MULTIPOLYGON (((74.89131 37.23164, 74.84023 37....
## MULTIPOLYGON (((14.19082 -5.875977, 14.39863 -5...
## MULTIPOLYGON (((-63.00122 18.22178, -63.16001 1...
## MULTIPOLYGON (((20.06396 42.54727, 20.10352 42....
Note that extracting the “geometry” column prints some useful metadata; it tells us that the dataset has 241 features, and that it represents spatial information as polygons (geometry type: MULTIPOLYGON). It also provides information on the dataset’s coordinate reference system (“CRS”). Roughly speaking, coordinate reference systems provide information on how actual locations on the Earth correspond to points on a two-dimensional map. They are a crucial concept to understand when carrying out geospatial analysis, but we won’t go into coordinate reference systems in detail, since you won’t need an in-depth understanding of them for basic cartography. For now, what is important to notice is that the “geometry” column is comprised of multiple geographic coordinates for each row (which corresponds to a distinct country); we can use this information in the “geometry” column to draw georeferenced polygons for each country/row in the spatial dataset, which will yield a world map!
To translate the information in the “geometry” column of the dataset
into a cartographic representation, we’ll use the tmap package.
In particular, we’ll use the tm_shape and
tm_polygons functions from tmap, which are
connected by a plus sign (+). The argument passed to the
tm_shape function is the name of the object associated with
the spatial dataset (country_boundaries, defined above). In
addition, the tm_polygons function indicates that the
spatial data is to be represented using polygons (as opposed to
alternatives such as lines or points), and does not require any
arguments (we’ll add some optional arguments to customize the map’s
appearance in just a bit). When we type in and run the following code
from our script, the result is a map that is rendered based on the
information in the “geometry” column of country_boundaries:
# maps geographic features (i.e. countries) of "country_boundaries" as polygons using tmap package functions
tm_shape(country_boundaries)+
tm_polygons()
If you don’t like the grey polygons, you can specify a desired color
within the tm_polygons() function. For guidance on working
with colors in R (including information on color and palette codes), see
this extremely useful R
Color Cheatsheet, by Melanie Frazier.
For example, let’s say we want to draw the polygons in the color associated with “darkorange” on the cheat sheet. We can use the following:
# maps geographic features (i.e. countries) of "country_boundaries" as polygons using tmap package functions; polygons rendered in "darkorange"
tm_shape(country_boundaries)+
tm_polygons("darkorange")
Or, say we prefer the color associated with the label “cadetblue2”:
# Maps country polygons from "country_boundaries" in "cadetblue2"
tm_shape(country_boundaries)+
tm_polygons("cadetblue2")
Just as we can assign datasets or numeric values to objects, so too
with maps. For example, let’s say we want to assign the orange world map
we generated above to an object named world_map_orange:
# assigns dark orange world map to object named "world_map_orange"
world_map_orange<-tm_shape(country_boundaries)+
tm_polygons("darkorange")
Now, whenever we want to bring up that particular map, we can simply print the name of the object, and the map will render in the “Plots” tab of the R Studio interface (on the bottom-right of the screen):
# prints contents of "world_map_orange"
world_map_orange
One of the nice things about tmap is that it allows us to
toggle back and forth between static print maps, and dynamic interactive
maps that allow users to zoom in/out, pan around, view attribute
characteristics etc. All you have to do to generate an interactive map
is use the tmap_mode() function to shift into “view” mode
with the following:
# set tmap mode to "view"
tmap_mode("view")
## tmap mode set to interactive viewing
Now, our tmap code outputs will yield a dynamic map:
# prints contents of "world_map_orange" in "view" mode
world_map_orange
This map can easily be saved as an html document, and subsequently embedded on a website.
If we want to shift back to a static map, simply switch back to
“plot” mode via the same tmap_mode function:
# set tmap mode to "plot"
tmap_mode("plot")
## tmap mode set to plotting
Now, our tmap code will once again yield a static
representation of the spatial information embedded in
country_boundaries:
# prints contents of "world_map_orange" in print mode
world_map_orange
We can edit spatial datasets in R Studio with relative ease, using functions from commonly-used data science packages from the tidyverse. Let’s say, for example, that we don’t want Antarctica to appear on our map (since Antarctica typically does not appear on political maps of the world).
To delete Antarctica from the map, we first need to delete the row
that corresponds to Antarctica in country_boundaries. We
can do so with the following code:
# Deletes Antarctica from "country_boundaries"
country_boundaries_modified<-country_boundaries %>% filter(iso_a3 !="ATA")
We can translate the code above into ordinary language as follows:
“Take the existing country boundaries dataset
(country_boundaries to the left of the %>%
and to the right of the assignment operator) and then
(%>%, a symbol called a pipe, which is used to chain
together code) select only the countries that are not Antarctica
(filter(iso_a3 !="ATA"). Take this amended (sans
Antarctica) spatial dataset, and assign it back to a new object named
country_boundaries_modified
(country_boundaries_modified<-).
Two things may require additional elaboration:
%>%. The pipe operator essentially takes the output of
the code on its left, and then use that output as an input to the code
on its right. Here, the pipe takes the country_boundaries
spatial object on its left, and then feeds this data into the
filter() function on its right. In other words, the pipe
operator links the code on its two sides, and establishes that the data
to be “filtered” within the filter function is
country_boundaries.filter() function is a function from the
dplyr package that allows one to select rows from a dataset
using specified criteria. In our case, we want to select all rows from
the dataset that are not Antarctica. The argument passed to the filter
function, iso_a3 !="ATA", is essentially saying “return any
records where the”iso_a3” variable (i.e. the 3 digit ISO country code)
in the attribute table is NOT equal to “ATA” (Antarctica’s code). Note
that != is R syntax for “not equal to”. If we were to instead type
filter(iso_a3==“ATA), the function would only select the Antarctica row
from the dataset and discard everything else.Now, let’s go ahead and map the revised
country_boundaries object:
# maps updated "country_boundaries_modified" object
tm_shape(country_boundaries_modified)+
tm_polygons()
Notice that Antarctica is no longer mapped, since the Antarctica
record is not in the country_boundaries_modified object
that contains the underlying data.
However, Antarctica is still in the country_boundaries
object, which we can confirm with the following:
# maps "country_boundaries" object
tm_shape(country_boundaries)+
tm_polygons()
Now that we loaded and explored our world map, it’s time to read in
the TRIP dataset on transgender rights into our R environment and assign
it to an object. You should have already downloaded the data to a
dedicated workshop directory, changed the filename of the dataset we’ll
be working with, and set the dedicated workshop directory as your R
working directory. At this point, we can use the
read_excel() function (since the dataset is an Excel file)
to read in the “TRIPScores.xlsx” dataset into R and assign it to an
object, which we’ll name trips:
# read in the "TRIPScores.xlsx" Excel file from the working directory into R Studio using the "read_excel()" function and assign it to a new object named "trips"
trips<-read_excel("TRIPScores.xlsx")
View(trips)
# filter by year to get 2000 data
trips_2000<-trips %>% filter(year==2000)
# Joins "trips_2000" to "country_boundaries" using 3-digit ISO codes; these ISO codes are contained in a column named "iso_a3" in "country_boundaries", and "country_text_id" in "trips_2000"; the product of the join is assigned to a new object that is named "trips_2000_spatial"
trips_2000_spatial<-left_join(country_boundaries, trips_2000,
by=c("iso_a3"="country_text_id"))
# replicates 2000 gmc map from paper (figure 3)
gmc_2000_map_replication<-tm_shape(trips_2000_spatial)+
tm_polygons(col="gmc",
style="cat",
title="",
palette=c("grey90", "grey70"),
colorNA="white",
textNA="No Data",
labels=c("Not possible/specified", "Possible, de jure"))+
tm_layout(frame=FALSE,
legend.outside=TRUE,
legend.text.size=0.6,
main.title="National Laws allowing legal gender marker change, 2000",
main.title.size = 0.8,
main.title.position = 0.2,
inner.margins=c(0.06, 0.1, 0.1, 0.08))
# prints "gmc_2000_map_replication"
gmc_2000_map_replication
Recall that we can also make an interactive map using tmap.
Let’s make an interactive version of trips_2000_spatial. To
do so, we’ll first change the tmap mode to “view” within the
tmap_mode() function:
# changes tmap mode to "view"
tmap_mode("view")
## tmap mode set to interactive viewing
Now, we can simply print the name of the
gmc_2000_map_replication object and the map assigned to it
will appear in interactive mode:
# makes interactive version of "gmc_2000_map_replication"
gmc_2000_map_replication
Before continuing, we’ll return to “plot” mode so that subsequent maps will appear as static maps:
# returns to "plot" mode
tmap_mode("plot")
Based on what you learned in the previous section, see if you can create a map of 2021 gender marker change laws (gmc); in other words, see if you can replicate Figure 4 in the paper that describes the TRIP dataset. Your final product would look something like this:
As an additional exercise, see if you can modify the color scheme in the map you created above. Remember to consult the R Color Cheat Sheet. What colors did you choose to represent the categories and why?
transgender_rights_overall_map<-tm_shape(trips_2021_spatial)+
tm_polygons(col="trip_score",
title="TRIP Score",
breaks=c(0, 3, 8.01, 13.01),
labels=c("Minimal Protections", "Moderate Protections", "Robust Protections"),
palette=c("lightgreen", "mediumpurple", "purple4"),
colorNA="white",
textNA="No Data")+
tm_layout(frame=FALSE,
legend.outside=TRUE,
legend.text.size=0.6,
main.title="Legal Protections for Transgender Rights",
main.title.size = 0.8,
main.title.position = 0.2,
inner.margins=c(0.06, 0.1, 0.1, 0.08))
# prints "transgender_rights_overall_map"
transgender_rights_overall_map
Finally, once we have made our map(s) in R Studio, and everything looks satisfactory, we’ll want to export them, so that they can be shared, embedded in papers etc.
One easy way to export your maps is to use the “Export” button within the “Plots” window of your R Studio interface. Once you click on the “Export” button, things are fairly self-explanatory; you click through a few menus, and can save (to a specified location on your computer) the map displayed in the “Plots” window as a PDF or as an image file.
If you prefer to export your map programmatically (which may be
preferable, from a reproducibility standpoint), you have a few options.
The easiest programmatic option is to use the tmap_save
function, which is a part of tmap. For example, recall the map
assigned to gmc_2000_map_replication, which was a
replication of Figure 3 in the paper; if you need to refresh your
memory, it looks like this:
gmc_2000_map_replication
Now, let’s export this map using the following code:
# exports map assigned to "trade_map_2015_custom_breaks" object to working directory as PDF file
tmap_save(tm=gmc_2000_map_replication,
filename="trip_gmc_map_2000.pdf",
width=1920,
height=1080)
## Map saved to /Users/adra7980/Documents/git_repositories/taw_mapping/exported_maps/trip_gmc_map_2000.pdf
## Size: 6.388889 by 3.597222 inches
Above, the first argument to tmap_save is the name of
the map object we want to export
(gmc_2000_map_replication), the second argument is the file
name we want to use for the exported file (along with the desired
extension), and the “width” and “height” arguments specify the
dimensions of the exported map. When this code is run, the map is
exported to your working directory. Note that if we wanted our map as an
image file, we could have simply specified a different file extension
(i.e. .png instead of .pdf). It’s best to experiment with different
parameters for the “width” and “height” arguments until you get the
exported map looking the way you want. There are other potential
arguments to tmap_save() that allow you to further
customize your exported map; we won’t review them here, but you can
learn more by looking at the function’s documentation by
typing?tmap_save(). If you want to export a map with an
inset, you will need to specify the name of the inset map object, and
the viewport specifications.
Solution to Exercise in Section 5.1
To create the map in Section 5.1 (which is a replication of the map in Figure 4 of the TRIPS paper), you could use the following code.
# filter by year to get 2021 data
trips_2021<-trips %>% filter(year==2021)
# Joins "trips_2021" to "country_boundaries" using 3-digit ISO codes; these ISO codes are contained in a column named "iso_a3" in "country_boundaries", and "country_text_id" in "trips_2021"; the product of the join is assigned to a new object that is named "trips_2021_spatial"
trips_2021_spatial<-left_join(country_boundaries, trips_2021,
by=c("iso_a3"="country_text_id"))
# replicates gmc map from paper (figure 4)
gmc_2021_map_replication<-tm_shape(trips_2021_spatial)+
tm_polygons(col="gmc",
style="cat",
title="",
palette=c("grey90", "grey70"),
colorNA="white",
textNA="No Data",
labels=c("Not possible/specified", "Possible, de jure"))+
tm_layout(frame=FALSE,
legend.outside=TRUE,
legend.text.size=0.6,
main.title="National Laws allowing legal gender marker change, 2021",
main.title.size = 0.8,
main.title.position = 0.2,
inner.margins=c(0.06, 0.1, 0.1, 0.08))